AITopics | n-best hypothesis

Collaborating Authors

n-best hypothesis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Game-Oriented ASR Error Correction via RAG-Enhanced LLM

Jiang, Yan, Luo, Yongle, Zhou, Qixian, Liu, Elvis S.

arXiv.org Artificial IntelligenceSep-30-2025

With the rise of multiplayer online games, real-time voice communication is essential for team coordination. However, general ASR systems struggle with gaming-specific challenges like short phrases, rapid speech, jargon, and noise, leading to frequent errors. To address this, we propose the GO-AEC framework, which integrates large language models, Retrieval-Augmented Generation (RAG), and a data augmentation strategy using LLMs and TTS. GO-AEC includes data augmentation, N-best hypothesis-based correction, and a dynamic game knowledge base. Experiments show GO-AEC reduces character error rate by 6.22% and sentence error rate by 29.71%, significantly improving ASR accuracy in gaming scenarios.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/CoG64752.2025.11114204

2509.2363

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

6492267465a7ac507be1f9fd1174e78d-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsSep-27-2025, 15:27:26 GMT

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Asia > China (0.47)
North America > United States (0.46)
North America > Canada (0.14)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Energy > Oil & Gas > Midstream (0.92)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

6492267465a7ac507be1f9fd1174e78d-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsSep-27-2025, 15:27:22 GMT

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China (0.47)
North America > United States (0.28)
Europe > Germany (0.14)
Europe > Czechia (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Energy > Oil & Gas > Midstream (0.93)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

6492267465a7ac507be1f9fd1174e78d-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsAug-19-2025, 20:39:48 GMT

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.92)
Asia > China (0.47)
North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.93)
Energy > Oil & Gas > Midstream (0.93)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLM-based Generative Error Correction for Rare Words with Synthetic Data and Phonetic Context

Yamashita, Natsuo, Yamamoto, Masaaki, Kokubo, Hiroaki, Kawaguchi, Yohei

arXiv.org Artificial IntelligenceMay-26-2025

Generative error correction (GER) with large language models (LLMs) has emerged as an effective post-processing approach to improve automatic speech recognition (ASR) performance. However, it often struggles with rare or domain-specific words due to limited training data. Furthermore, existing LLM-based GER approaches primarily rely on textual information, neglecting phonetic cues, which leads to over-correction. To address these issues, we propose a novel LLM-based GER approach that targets rare words and incorporates phonetic information. First, we generate synthetic data to contain rare words for fine-tuning the GER model. Second, we integrate ASR's N-best hypotheses along with phonetic context to mitigate over-correction. Experimental results show that our method not only improves the correction of rare words but also reduces the WER and CER across both English and Japanese datasets.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.1741

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models

Ognjen, null, Rudovic, null, Dighe, Pranay, Su, Yi, Garg, Vineet, Dharur, Sameer, Niu, Xiaochuan, Abdelaziz, Ahmed H., Adya, Saurabh, Tewfik, Ahmed

arXiv.org Artificial IntelligenceNov-4-2024

Follow-up conversations with virtual assistants (VAs) enable a user to seamlessly interact with a VA without the need to repeatedly invoke it using a keyword (after the first query). Therefore, accurate Device-directed Speech Detection (DDSD) from the follow-up queries is critical for enabling naturalistic user experience. To this end, we explore the notion of Large Language Models (LLMs) and model the first query when making inference about the follow-ups (based on the ASR-decoded text), via prompting of a pretrained LLM, or by adapting a binary classifier on top of the LLM. In doing so, we also exploit the ASR uncertainty when designing the LLM prompts. We show on the real-world dataset of follow-up conversations that this approach yields large gains (20-40% reduction in false alarms at 10% fixed false rejects) due to the joint modeling of the previous speech context and ASR uncertainty, compared to when follow-ups are modeled alone.

asr uncertainty, hypothesis, query, (14 more...)

arXiv.org Artificial Intelligence

2411.00023

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ProGRes: Prompted Generative Rescoring on ASR n-Best

Tur, Ada Defne, Moumen, Adel, Ravanelli, Mirco

arXiv.org Artificial IntelligenceSep-8-2024

Large Language Models (LLMs) have shown their ability to improve the performance of speech recognizers by effectively rescoring the n-best hypotheses generated during the beam search process. However, the best way to exploit recent generative instruction-tuned LLMs for hypothesis rescoring is still unclear. This paper proposes a novel method that uses instruction-tuned LLMs to dynamically expand the n-best speech recognition hypotheses with new hypotheses generated through appropriately-prompted LLMs. Specifically, we introduce a new zero-shot method for ASR n-best rescoring, which combines confidence scores, LLM sequence scoring, and prompt-based hypothesis generation. We compare Llama-3-Instruct, GPT-3.5 Turbo, and GPT-4 Turbo as prompt-based generators with Llama-3 as sequence scorer LLM. We evaluated our approach using different speech recognizers and observed significant relative improvement in the word error rate (WER) ranging from 5% to 25%.

hypothesis, llm, proceedings, (17 more...)

arXiv.org Artificial Intelligence

2409.00217

Country:

North America > Canada > Quebec > Montreal (0.14)
Asia > Philippines (0.05)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models

Tang, Zhiyuan, Wang, Dong, Huang, Shen, Shang, Shidong

arXiv.org Artificial IntelligenceJul-1-2024

Recent studies have demonstrated the efficacy of large language models (LLMs) in error correction for automatic speech recognition (ASR). However, much of the research focuses on the English language. This paper redirects the attention to Chinese. Firstly, we construct a specialized benchmark dataset aimed at error correction for Chinese ASR with 724K hypotheses-transcription pairs, named the Chinese Hypotheses Paradise dataset (ChineseHP), which contains a wide range of scenarios and presents significant challenges. Subsequently, we conduct a preliminary evaluation using the dataset for both direct-prompting and fine-tuning pre-trained LLMs. Furthermore, we propose a straightforward method of Pinyin regularization for prompts, which involves the transcription of Pinyin directly from text hypotheses. The experimental results reveal that Pinyin regularization consistently enhances the error-correcting ability of LLMs when compared with those without regularization. The dataset is available on the website.

dataset, error correction, hypothesis, (13 more...)

arXiv.org Artificial Intelligence

2407.01909

Country: Asia > China (0.05)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Transformer-based Model for ASR N-Best Rescoring and Rewriting

Kang, Iwen E., Van Gysel, Christophe, Siu, Man-Hung

arXiv.org Artificial IntelligenceJun-12-2024

Voice assistants increasingly use on-device Automatic Speech Recognition (ASR) to ensure speed and privacy. However, due to resource constraints on the device, queries pertaining to complex information domains often require further processing by a search engine. For such applications, we propose a novel Transformer based model capable of rescoring and rewriting, by exploring full context of the N-best hypotheses in parallel. We also propose a new discriminative sequence training objective that can work well for both rescore and rewrite tasks. We show that our Rescore+Rewrite model outperforms the Rescore-only baseline, and achieves up to an average 8.6% relative Word Error Rate (WER) reduction over the ASR system by itself.

hypothesis, n-best hypothesis, n-best list, (15 more...)

arXiv.org Artificial Intelligence

2406.08207

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Ghosh, Sreyan, Kumar, Sonal, Seth, Ashish, Chiniya, Purva, Tyagi, Utkarsh, Duraiswami, Ramani, Manocha, Dinesh

arXiv.org Artificial IntelligenceJun-6-2024

Visual cues, like lip motion, have been shown to improve the performance of Automatic Speech Recognition (ASR) systems in noisy environments. We propose LipGER (Lip Motion aided Generative Error Correction), a novel framework for leveraging visual cues for noise-robust ASR. Instead of learning the cross-modal correlation between the audio and visual modalities, we make an LLM learn the task of visually-conditioned (generative) ASR error correction. Specifically, we instruct an LLM to predict the transcription from the N-best hypotheses generated using ASR beam-search. This is further conditioned on lip motions. This approach addresses key challenges in traditional AVSR learning, such as the lack of large-scale paired datasets and difficulties in adapting to new domains. We experiment on 4 datasets in various settings and show that LipGER improves the Word Error Rate in the range of 1.1%-49.2%. We also release LipHyp, a large-scale dataset with hypothesis-transcription pairs that is additionally equipped with lip motion cues to promote further research in this space

hypothesis, lipger, speech recognition, (10 more...)

arXiv.org Artificial Intelligence

2406.04432

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback